Strategies for rescoring keyword search results using word-burst and acoustic features
نویسندگان
چکیده
The identification of keyword queries in speech data from lowresources languages poses a challenge for current methods as speech recognition algorithms lack sufficient training data to produce high accuracy transcript. To compensate for these shortcomings, we extract signals from the data that are useful in keyword identification but are not being used by the speech recognizer. These signals take multiple forms — word burstiness, rescored confusion network posteriors and acoustic/prosodic qualities. The former denotes the tendency for keywords to occur in bursts within a conversational topic. We employ three different strategies to exploit this information: 1) a four-way classification of keyword hypotheses that targets low-scoring correct hits and high-scoring false alarms, 2) ranking algorithms, and 3) a direct adjustment of keyword hit scores based on hypothesized repetition. We find that interpolating the results of these three strategies in an ensemble provides a reliable way to improve the results of keyword search.
منابع مشابه
Spoken Keyword Rescoring and Document Retrieval for Low-resource Languages
For languages that have adequate data for automatic speech recognition (ASR), many keyword search(KWS) and document retrieval(SDR) systems have been developed with near-optimal performance. However, lacking of sufficient training data to produce high accuracy transcript, identification and retrieval of queries in speech data from low-resources languages remains challenging. To compensate for th...
متن کاملTask dependent loss functions in speech recognition: a* search over recognition lattices
A recognition strategy that can be matched to specific system performance criteria such as word error rate or F-measure has recently been found to yield improvements over the usual maximum a-posteriori probability strategy [1] [2] [3]. In this matched-to-the-task strategy a hypothesis is chosen to minimize the expected loss or the Bayes Risk under a loss function defined by a performance measur...
متن کاملTask Dependent Loss Functions in Speech Recognition: Search over Recognition Lattices
A recognition strategy that can be matched to specific system performance criteria such as word error rate or F-measure has recently been found to yield improvements over the usual maximum a-posteriori probability strategy [1] [2] [3]. In this matched-to-the-task strategy a hypothesis is chosen to minimize the expected loss or the Bayes Risk under a loss function defined by a performance measur...
متن کاملDirect word graph rescoring using a* search and RNNLM
The usage of Recurrent Neural Network Language Models (RNNLMs) has allowed reaching significant improvements in Automatic Speech Recognition (ASR) tasks. However, to take advantage of their capability for considering long histories, they are usually used to rescore the N-best lists (i.e. it is in practice not possible to use them directly during acoustic trellis search). We propose in this pape...
متن کاملEcholocation: Using Word-Burst Analysis to Rescore Keyword Search Candidates in Low-Resource Languages
ECHOLOCATION: USING WORD-BURST ANALYSIS TO RESCORE KEYWORD SEARCH CANDIDATES IN LOW-RESOURCE LANGUAGES
متن کامل